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(a) Our HDR result 


(b) Single homography 


(c) Bao et al. 


(d) Our result 


Figure 1: When capturing HDR stacks without a tripod, parallax and non-rigid scene changes are the main sources of artifacts. The picture 
in (a) is an HDR image generated by our algorithm from a stack of two pictures taken with a hand-held camera (notice that the volleyball 
is hand-held as well). A common and efficient method to register the images is to use a single homography, but parallax will still cause 
ghosting artifacts, see (b). One can then resort to non-rigid registration methods; here we use the fastest method of which we are aware, 
but artifacts due to erroneous registration are still visible (c). Our method is several times faster and, for scenes with parallax and small 
non-rigid displacements, produces better results (d). 


Abstract 

Image registration for stack-based HDR photography is 
challenging. If not properly accounted for, camera motion 
and scene changes result in artifacts in the composite im¬ 
age. Unfortunately, existing methods to address this prob¬ 
lem are either accurate, but too slow for mobile devices, or 
fast, but prone to failing. We propose a method that fills this 
void: our approach is extremely fast—under 700mv on a 
commercial tablet for a pair of 5MP images—and prevents 
the artifacts that arise from insufficient registration quality. 


1. Introduction 

High-Dynamic-Range (HDR) imaging has become an 
essential feature for camera phones and point-and-shoot 
cameras—even some DSLR cameras now offer it as a 
shooting mode. To date, the most popular strategy for cap¬ 
turing HDR images is to take multiple pictures of the same 
scene with different exposure times, which is usually re¬ 
ferred to as stack-based HDR. Over the last decade, the 
research community also proposed several hardware solu¬ 
tions to sidestep the need for multiple images, but the trade¬ 
off between cost and picture quality still makes stack-based 


strategies more appealing to camera manufacturers. 

Combining multiple low-dynamic-range (LDR) images 
into a single HDR irradiance map is relatively straightfor¬ 
ward, provided that each pixel samples the exact same irra¬ 
diance in each picture of the stack. In practice, however, 
any viable strategy to merge LDR images needs to cope 
with both camera motion and scene changes. Indeed there 
is a rich literature on the subject, with different methods of¬ 
fering a different compromise between computational com¬ 
plexity and reconstruction accuracy. 

On one end of the spectrum there are light-weight meth¬ 
ods, generally well-suited to run on mobile devices. These 
methods address the problem of camera motion by estimat¬ 
ing a global transformation in a robust fashion [19, 17]. Af¬ 
ter image alignment, scene changes can be addressed with 
some flavor of outlier rejection, often called deghosting. 
This can be achieved by picking one image of the stack to 
act as a reference and only merging consistent pixels from 
the other images [4, 15]. Alternatively, for sufficiently large 
stacks, one can merge only the irradiance values most often 
seen for a given pixel [20, 14]. The price for the compu¬ 
tational efficiency of rigid-registration methods is their se¬ 
vere limitation in terms of accuracy: even the most general 
global transformation, i.e., a homography, cannot correct 
for parallax, which occurs for non-planar scenes every time 
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the camera undergoes even a small amount of translation. 

On the other end of the spectrum lie methods that allow 
for a completely non-rigid transformation between the im¬ 
ages in the stack [16, 9]. Rather than separating camera mo¬ 
tion and scene changes, these algorithms attempt to “move” 
any given pixel in one shot of the stack to its corresponding 
location in the reference image. These methods have shown 
impressive results, essentially with any amount and type of 
motion in the scene, but generally require minutes on desk¬ 
top computers: they are simply impractical for deployment 
on mobile devices. 

We fill the gap between these two extreme points in the 
space of registration accuracy versus computational com¬ 
plexity. Our work builds on the observation that most 
modem devices, such as the NVIDIA SHIELD Tablet or 
the Google Nexus 6 phone, are capable of streaming full- 
resolution YUV frames at SOfps. Given an exposure time t, 
the delay between consecutive shots is then (33 — t)ms < 
33ms \ which prevents large displacements of moving ob¬ 
jects. Parallax, however, remains an issue even for small 
camera translations. Figure 1(b) shows the extent of these 
artifacts. (Note that, for better visualization, all the insets in 
Figure 1 were generated with a simple blending.) 

Our method comprises a strategy to find sparse corre¬ 
spondences that is particularly well-suited for HDR stacks, 
where large parts of some images are extremely dark. After 
detecting spurious matches, it corrects for parallax by prop¬ 
agating the displacement computed at the discrete locations 
in an edge-aware fashion. The proposed algorithm can also 
correct for small non-rigid motions, but may fail for cases 
where the subject simply cannot be expected to cooperate, 
as is the case in sport photography. To address such cases, 
we couple our locally non-rigid registration algorithm with 
a modified version of the exposure fusion algorithm [13]. 

Our method runs in 677ms on a stack of two 5MP images 
on a mobile device, which is several orders of magnitude 
faster than any non-rigid registration method of which we 
are aware. On a desktop machine, our method can register 
the same stack in 150ms, corresponding to a speedup of 
roughly 11 x over the fastest optical fiow methods published 
to date (see Section 3). 

2. Method 

Several recently published methods successfully tackled 
the task of non-rigid registration for large displacements by 
using approximate nearest neighbor fields (NNFs) [9, 16]. 
While the quality of the results they produce is impressive, 
even in the case of very large displacements, their compu¬ 
tational cost is prohibitive, in particular for mobile devices. 
Moreover, given the frame rate at which bursts of images 

Tf t > 33ms, the limiting factor will likely be blur, and the delay 
between shots will still be (33 — mod(33, t))ms < 33ms. 


can be acquired, the tolerance to large displacements that 
those methods offer is most often unnecessary. 

The problem of large displacements is further attenu¬ 
ated by the dynamic range of modern sensors, which al¬ 
lows to capture most scenes with only two shots; leverag¬ 
ing on this observation, we focus on two-image exposure 
stacks, although the extension to more images only requires 
to run the algorithm n — 1 times for a stack of n images, as 
shown in Figure 5, where we register a stack of three im¬ 
ages. Rather than computing an expensive NNF, which, for 
the vast majority of stacks, would mostly consist of small 
and relatively uniform displacements, we find sparse cor¬ 
respondences between the two images. While extremely 
fast, the matcher we designed for this purpose produces 
accurate matches, even in extremely dark regions—a par¬ 
ticularly important feature for HDR stacks. To solve the 
parallax problem, rather than registering the images with a 
single homography, we propose to propagate the sparse fiow 
from the matches computed in the previous stage in an edge- 
aware fashion. To merge the images we modified exposure 
fusion [13] to compensate for potential errors in the com¬ 
putation of the fiow. We implemented the full pipeline— 
stack capture, image registration, and image fusion—on an 
NVIDIA SHIELD Tablet. 

In the remainder of this section we describe in detail the 
different components of our algorithm. 

2.1. Stack capture and reference selection 

Metering for HDR, i.e., the selection of the exposure 
times and number of pictures required to sample the irra- 
diance distribution for a particular scene, has been an ac¬ 
tive area of research [5, 7, 8]. We observe that the dynamic 
range of modern sensors allows to capture most real-world 
scenes with as little as two exposures, and devise a sim¬ 
ple strategy that works well in our experiments: we use the 
Expose-To-The-Right (ETTR) paradigm [8] for the first im¬ 
age in the stack, and select the second exposure time to be 
2, 3, or 4 stops brighter, based on the number of under¬ 
exposed pixels (the more under-exposed pixels, the longer 
the second exposure). Limiting the number of candidate ex¬ 
posures to three allows for a faster metering; moreover, the 
advantage of a higher granularity is difficult to appreciate 
by visually inspecting the HDR result. 

Then, rather than picking the reference image for our 
registration algorithm to be the one with the least saturated 
and underexposed pixels [4], we always use the shortest 
(darkest) exposure as the reference; this is because, while 
the noise in the dark regions of a scene makes it difficult to 
find reliable matches, saturation makes it impossible. In the 
rest of the paper we will refer to the two images in the stack 
as reference and source, indicating our final goal to warp 
the source to the reference. 




Figure 2: After dividing the reference image in tiles, our 
matcher looks at the center of each tile (red dot) and a set 
of predefined locations around it (green dots). It then com¬ 
putes a measure of cornerness based on the average lumi¬ 
nance in the four quadrants around each candidate corner. 

2.2. A fast, robust matcher 

In order to produce fast and reliable correspondences, 
even in the presence of the noise the potentially short ex¬ 
posure time induces, we propose a novel matcher. To this 
end, we efficiently find distinct features, i.e., comers, in the 
reference image, which we then match in the second image 
with a patch-based search. First, we define a simple mea¬ 
sure of cornerness: 
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^ ^ |/^mod(ji‘+l,4) ~ Pj\ (1) 

i=i 

where p is the pixel location, and pj is the average lumi¬ 
nance value in the quadrant around p, marked in black in 
Figure 2. Equation 1 simply measures the change in the av¬ 
erage luminance in the four quadrants around p. However, 
for p to be a good comer candidate, we also require that the 
minimum difference of average luminance between any two 
contiguous quadrants be large: 

min |/^mod(jf-|-l,4) Pj\ ^ ^ (^) 

where T is a given threshold. Essentially, Equation 2 pre¬ 
vents situations in which a point on an edge is promoted to 
a comer. Note that, regardless of the number of pixels in 
each quadrants, pj can be computed very efficiently using 
integral images. 

To encourage a uniform distribution of corners over the 
image, we first divide the reference image in tiles. Then, 
we look at a predefined set of locations (and not at all pixels 
for computational efficiency) around the center of each tile 
(Figure 2), and retain the one with the highest cornerness 
value. Note that Equation 2 also prevents finding corners in 
fiat regions, therefore tiles that are completely fiat may not 
be assigned a corner. 


Although the proposed algorithm to search for comers is 
extremely efficient, we do not mn it on both reference and 
source images, as it only serves as a feature detector. Hence, 
we evaluate comers in the reference image only, and then 
search for matches within a predefined search radius around 
the corresponding position in the source image using the 
sum of squared differences metric (SSD). 

Finally, to minimize the computational cost while allow¬ 
ing for a large search radius, we use a pyramidal approach. 
At level £ of the pyramid, we find a set of corners in the 
reference. For each corner we search for matches in the 
source image within a search radius around 

®frc = (3) 

where is a single homography computed at the pre¬ 

vious layer of the pyramid using its corners and matches. 
Equation 3 holds if both x^rc and ccref are represented in a 
normalized coordinate system such that x G [—1,1] and 
y G [—h/w,h/w], where x = (x^y), and w and h are 
width and height of the image. We also move the origin to 
the center of the image. This homography only serves as 
a way to initialize the search locations in the source image 
(and in turn reduces the required search radius compared to 
a random initalization). Note that xf^^ are computed directly 
on layer £, and not upsampled from layer + 1, as relevant 
features at layer £ may have not been present at layer ^ -f 1 
and may not be visible in layer £ — 1. 

2.3. Weeding out spurious matches 

The matcher we describe in Section 2.2 generally pro¬ 
duces robust matches even in very low light conditions (see 
Section 3 for a more detailed evaluation). However, to 
detect and remove potential spurious matches, we run an 
additional filtering stage. The key idea is that we require 
matches to be locally consistent with a homography; those 
that are not, are likely to be incorrect, because we expect 
small displacements. The consistent matches can be deter¬ 
mined by means of a robust method, such as RANSAC [3]. 
A straightforward application of RANSAC, however, is too 
expensive. Instead, we developed a fast filtering strategy 
to weed out spurious matches that are not consistent with a 
homography. 

The goal of our novel filter is to efficiently obtain the set 
M of reliable matches. We consider a match to be reliable if 
there exists a large number of matches that are mapped from 
the source image to the reference image by the same local 
homography H ; in other words, we aim at finding all the 
matches that induce any homography supported by a large 
set of inliers. 

The filter, inspired by RANSAC, works iteratively. 
Specifically, at the i* iteration, we randomly sample 4 
points from our set of matches, fit a homography W, and 
find the subset of inliers P with respect to W . Then, rather 








than saving the homography supported by the largest set of 
inliers, we simply update M using the following rule: 

. ( 4 ) 

IM* ^ otherwise 

where | • | indicates the cardinality of a set, and 6 is a. thresh¬ 
old. To understand the idea behind Equation 4, consider a 
toy scene where most of the corners are distributed on two 
static planar surfaces at different distances from the camera, 
with a few other moving non rigidly. We would like our al¬ 
gorithm to weed out the non-rigid corners, as well as the 
corners on the two surfaces that are incorrectly matched. At 
each iteration, one of two things can happen. First, the sam¬ 
pling may include corners from both planes or those mov¬ 
ing non-rigidly; the resulting homography will have a small 
number of inliers, and the set of reliable matches M will not 
be modified. Second, all of the points sampled at the cur¬ 
rent iteration belong to one of the two planes. In this case 
the resulting homography will explain the motion of all the 
corners that are on the same plane and that move rigidly; the 
set of reliable matches M will be updated to include these 
inliers. Note that we do not remove the inliers of the 
iteration from the original set of corners. 

We further speed up the process by running n instances 
of our filter2 on separate threads, with each instance run¬ 
ning only 1/n iterations; because we take the union of the 
acceptable inliers from previous iterations, running our fil¬ 
ter n times for 1 /n iterations is exactly equivalent to a single 
run on n iterations. After each run k terminates, we simply 
merge the resulting sets Mk . 

2.4. Sparse-to-dense flow 

So far, we have described a method to efficiently find 
a set of robust matches between the images, constituting 
sparse flow. To be able to warp the source image to the 
reference, however, we need to compute the displacement 
at every pixel. A simple interpolation of the sparse fiow 
would produce artifacts similar to those caused by using a 
single homography: depth discontinuities and boundaries of 
moving objects would not be aligned accurately. 

Instead, we would like to interpolate the sparse fiow in 
an edge-aware fashion. The problem is similar to that of 
image colorization [10], where colors are propagated from 
a handful of sparse pixels that have been assigned a color 
manually. In our case, we propagate the fiow components 
{u^v) computed at discrete locations. 

For this purpose, we employ an efficient CUDA im¬ 
plementation of the algorithm proposed by Gastal and 
Oliveira [6], and use it to cross-bilateral filter the fiow. Simi¬ 
larly to how they propose to propagate colors, we first create 


two maps Pu and Py 


if comer 

^ [0 elsewhere 

where / = {u,v}. We then use the reference image to 
cross-bilateral filter the maps. However, while this propa¬ 
gates the fiow in an edge-aware fashion generating the two 
maps Pf, it will affect the value of the fiow at the location 
of the comers, which should not change. Therefore, we use 
a normalization map 


N{p) = 


if p is a corner 
elsewhere 


( 6 ) 


The final fiow F can then be computed as Ff = Pf/N, 
where N is the cross-bilateral filtered version of N. 

2.5. Error-tolerant image fusion 

If the number of matches between reference and source 
images is low, or if a particular area of the scene is texture¬ 
less, the quality of the fiow propagation described in Sec¬ 
tion 2.4 can deteriorate because the accuracy of the fiow is 
affected by the spatial distance over which it needs to prop¬ 
agate. To detect and compensate for errors that may arise 
in such cases, we propose a simple modification of the ex¬ 
posure fusion algorithm proposed by Mertens et al. [13]. In 
addition to weights for contrast, color saturation, and well¬ 
exposedness, we add a fourth weight that refiects the quality 
of the registration. Specifically, we choose to use the struc¬ 
tural similarity index (SSIM) [18]. Note that computing 
the SSIM map only requires to perform five convolutions 
with Gaussian kernels and a few other parallelizable opera¬ 
tions such as pixel-wise image multiplication and sum; this 
makes a GPU implementation of SSIM extremely efficient, 
see Section 3 for an analysis of its runtime. 

Figure 3 shows an example of failure of the edge-aware 
propagation stage, and how our error-tolerant fusion can de¬ 
tect and compensate for it. 

2.6. Implementation details 

For the matcher, we create up to 5 pyramid layers (we 
stop early if the coarsest level falls below 100 pixels in ei¬ 
ther height or width). The patch comparison is computed 
on 21 X 21 patches, and the maximum search radius at each 
level is 10. We evaluate the comemess within a patch on 
a regular grid of points spaced by 1/16* of the tile size. 
We implemented the method described here—from capture 
to generation of the HDR image, with a mixture of C-i-i- 
and CUDA code. Specifically, the matcher and the weeding 
stage run on the CPU, with the weeding stage being multi¬ 
threaded. The remaining parts are heavily CUDA-based. 
Finally, for the sparse-to-dense stage, it is important to use 




(a) Blended stack (b) Source warped 



(c) Final HDR 


Figure 3: A failure case of the flow propagation stage. The 
input images are first blended to show the original displace¬ 
ment (a). The warped source produced by our algorithm 
presents a few artifacts, a couple of which are marked. 
However, our error-tolerant exposure fusion can detect and 
correct for those errors. 


a large spatial standard deviation to make sure that the fiow 
can be propagated to regions poor in number of correspon¬ 
dences; we use as = 400 . 

3. Evaluation and Results 

In this section we evaluate the performance of our algo¬ 
rithm, both in terms of quality of the result and execution 
time, by means of comparisons with state-of-the-art meth¬ 
ods. Note that we perform histogram equalization on all the 
input images to attenuate brightness differences. 

3.1. Quality comparisons 

We are particularly interested in evaluating the quality of 
our matcher, and the quality of the final result when com¬ 
pared against other non-rigid registration methods. 

The matcher —One of our claims pertains to the robustness 
of our matcher in particularly low-light situations. Figure 4 
shows a comparison between our matcher and SIFT [11]. 
In terms of robustness, SIFT is arguably the state-of-the- 
art method for finding correspondences between images. 
And indeed it can produce reliable correspondences even 
in the presence of large displacements, where our matcher 
would fail. However, when one of the two images is ex¬ 
tremely dark, SIFT may fail dramatically, as shown in Fig¬ 
ures 4c and 4h. One can filter them in a manner similar to 
the one we propose in Section 2.3. Nevertheless, in extreme 
cases such as those shown in the figure, the correspondences 
may be so poor that after the filtering stage, too few are left 
to perform an accurate warp; the low number and quality 
of the correspondences shown in Figures 4d and 4i for in¬ 
stance, are the cause of the artifacts visible in the first and 
second row of Figure 8d. On the contrary, our method still 
produces high-quality correspondences. This ability is key 
to the success of the registration of HDR stacks. 

Non-rigid registration algorithms —The context of our 
method is different from that of algorithms that aim at 
achieving a high-quality result, even in the presence of large 



(a) Reference (d) Sen et al. [16] 



(b) Source 1 (e) Hu et al. [16] 



(c) Source 2 (f) Our result 


Figure 5: Comparison with the state-of-the-art methods for 
non-rigid registration. To produce a result that is visually 
comparable to the related work, we use the tonemapping 
operator proposed by Mantiuk et al. [12], rather than our 
modified exposure fusion, see Section 3.1; however, the 
color differences that are still visible are solely due to the 
tonemapper parameters. 


displacements. However, we still compare on cases that are 
within the scope of our paper; for the comparison we pick 
the algorithms that can deliver the best quality [16, 9], and 
the fastest non-rigid registration algorithm of which we are 
aware [1]. 

Figure 5 shows a comparison with competitors that de¬ 
liver the highest quality. To perform it, we registered the im¬ 
ages in the stack to the shortest exposure. Note that “Source 
2” is 4 stops brighter than the reference, and yet our method 
correctly warps it; the other two methods use the middle 
exposure as the reference. Also, to simplify the task of vi¬ 
sually comparing the results of the three approaches, rather 
than using the modified version of exposure fusion that we 
described in Section 2.5, we output the warped images, cre¬ 
ate an HDR irradiance map, and use the tonemapper pro¬ 
posed by Mantiuk et al. [12]. The quality of the sky in our 
result is comparable with that of Hu et al., and better than 
that of Sen et al .—the sun is still present and there are no 
halos. Note that some of the people walking under the dome 
are not correctly registered by our method; both the other re¬ 
sults correctly register that region. However, as mentioned 











(f) Reference (g) Source (h) SIFT (i) SIFT clean (j) Ours 


Figure 4: The matcher we propose performs particularly well when searching for correspondences in extremely dark areas, 
as is needed for large portions of the two stacks shown here. SIFT fails to find reliable correspondences; a solution could be 
to only retain the matches that support a homography, here indicated as “SIFT clean”. However, if the quality of the original 
matches is too low, very few correspondences survive the cleaning stage, as is the case shown in (d) and (i). Our method 
produces a uniformly distributed set of matches. Note that both methods were fed images that were histogram equalized. 


above, in this example we did not run our error-tolerant fu¬ 
sion, which would take care of that problem. 

A method more similar in spirit to ours, is the flow algo¬ 
rithm recently proposed by BdiO et al. [1]. While not specif¬ 
ically designed for HDR registration, their algorithm is im¬ 
pressively fast (see Section 3.2). At its core, the method 
by Bao and colleagues uses PatchMatch to deal with large 
displacements [2]. To ameliorate the flow accuracy in oc¬ 
clusion and disocclusion regions, they compute the match¬ 
ing cost in an edge-aware fashion; at the same time they 
improve on speed by computing the cost only at a wisely 
selected subset of pixels. Note that, despite being sev¬ 
eral times faster than the competitors, the method by Bao 
and colleagues ranks within the top ten positions in all of 
the established flow benchmark datasets. We compare our 
method with theirs on cases that are within the scope of both 
algorithms. 

Figure 1 shows a fairly common case for an HDR stack, 
with both camera motion and slight scene motion (the 
woman is holding the volleyball). In all the comparisons 
with their method, we first equalize the images to com¬ 
pensate for illumination changes. The method by Bao and 
colleagues produces strong artifacts that are visible in Fig¬ 
ure 1(c); on the contrary our method registers the images 
perfectly. Note that the original images are 5MP images, 
which is possibly larger than what their method was orig¬ 
inally designed for; please see the additional material for 
more comparisons, including lower resolution stacks. 

Figure 6 shows another comparison, this time with both 
algorithms running on a VGA stack. In order to perform a 
more fair comparison, the images were taken with a small, 
1-stop separation, and neither of them presents saturation; 
because of the limited dynamic range and spacing of the ex¬ 
posure times of this example, histogram equalization makes 



Figure 6: Comparison with the method by B^o et al. [1]. 
Images (b) and (c) are the source images warped with the 
method by Bao et al. and by our algorithm respectively. No¬ 
tice the artifacts affecting the results by Bao et al. 


the source and the reference essentially identical in terms of 
brightness. The insets of the figure show that the method by 
Bao et al. fails in preserving the local structure of the tubes. 

On both examples, our algorithm produces a more ac¬ 
curate registration. Figure 8 shows more results of our 
method. 

3.2. Execution time 

One of the biggest strengths of our method is its compu¬ 
tational efficiency. We first validate this claim by compar¬ 
ing the runtime of our algorithm to three related works. For 
this experiment, we used VGA images. Two preliminary 
comments are in order; first, the methods by Sen et al. and 
Hu et al. are implemented in a mixture of Matlab and 





















Algorithm 

Execution time 

Speedup 

Our algorithm 

49ms 

— 

Bao et al. [1] 

171ms 

?^3.5x 

Sen et al. [16] 

106*s 

> l,900*x 

Hu et al. [9] 

94* s 

> 2,000*x 


Table 1: Comparison of the execution time with different 
state-of-the-art algorithms. The tests were run on VGA im¬ 
ages. The * indicates execution times for a mixture of Mat- 
lab and code. 



Figure 7: Computational time of the algorithm by Bao et 
al. [1] and ours. Note that our method grows sublin- 
early. The timings were captured on an NVIDIA GTX Titan 
graphics card. 

code, which makes them intrinsically slower. However, the 
speedup is significant even when accounting for that. Sec¬ 
ond, the execution times shown in Table 1 for their methods 
are those reported by Oh et al. [14]. 

A more interesting comparison is with the method of 
Bao and colleagues, both because they implemented their 
algorithm very efficiently in CUDA, and because execution 
speed is one of their main focuses. Indeed, recall that, to the 
best of our knowledge, theirs is the fastest published method 
for optical flow. And yet, our method is roughly 3.5 x faster. 
This, however, is only a partial evaluation: while the execu¬ 
tion time of our algorithm grows sublinearly, theirs grows 
linearly with the number of pixels, as shown in Figure 6. 
On an NVIDIA GTX Titan, for a pair of 5MP images, their 
code runs in 1.66s; our method registers the same images in 
150ms, which translates to a speedup 11 x . 

Table 2 shows the cost of each step our algorithm on a 
desktop machine as well as a tablet, both for pairs of 5MP 
images. Aside from rigid registration methods, we are not 
aware of any published work capable of registering two 
5MP images in a time even close to a second on a desk¬ 
top. Our approach can do it in less than that (677ms) on a 
commercial tablet. 

Moreover, as shown in Figure 7, our method scales well 
with image size; this is a particularly attractive feature, 
given the rate at which the number of pixels in widely avail¬ 
able sensors is growing. 


Step of the algorithm 

Tablet 

Desktop 

Matcher (Sec. 2.2) 

132ms 

49ms 

Match weeding (Sec. 2.3) 

23ms 

20ms 

Sparse-to-dense flow (Sec. 2.4) 

473ms 

67ms 

Eusion weights (Sec. 2.5) 

49ms 

11ms 

Total time 

677ms 

147ms 


Table 2: Computational time for each step of the algorithm 
when run on a pair of 5MP images. The reference tablet is 
an NVIDIA Shield Tablet, which is equipped with a Tegra 
K1 system-on-chip. The timings on desktop were measured 
on an Intel 17 CPU with an NVIDIA GTX Titan graphics 
card. 


4. Conclusions 

In the space of registration for HDR imaging, and stack- 
based photography in general, it is difficult to find an ac¬ 
ceptable trade-off between registration accuracy and com¬ 
putational load. We propose a new compromise: rather than 
attempting to solve the most general non-rigid registration 
case, we focus on the more typical case of relatively small 
displacement, and propose a locally non-rigid registration 
technique. Specifically, we contribute a method that is 11 x 
faster than the fastest published method, while producing a 
more accurate registration. Our approach is also the only 
one that can perform non-rigid registration within the com¬ 
putational power of a mobile device. To achieve this result, 
we developed a novel, fast feature matcher that works better 
than the state-of-the-art when the reference image is under¬ 
exposed. Our matcher comprises an original light-weight 
comer detector, and a matching strategy based on a mod¬ 
ification of the RANSAC algorithm. We think that this 
matcher may be useful for other applications as well. Fi¬ 
nally, we implement the complete system, from capture to 
HDR generation, on an NVIDIA SHIELD Tablet. This also 
involves a metering strategy, a flow propagation step, and 
a deghosting strategy to compensate for errors in the flow 
propagation, vspace-lmm 
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(a) Reference (b) Source (c) Blended Stack (d) Homogr. + Blend (e) Our Align. + Blend (f) Our Final HDR 
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